Model selection for linear classifiers using Bayesian error estimation

نویسندگان

  • Heikki Huttunen
  • Jussi Tohka
چکیده

Regularized linear models are important classification methods for high dimensional problems, where regularized linear classifiers are often preferred due to their ability to avoid overfitting. The degree of freedom of the model is determined by a regularization parameter, which is typically selected using counting based approaches, such as K-fold cross-validation. For large data, this can be very time consuming, and, for small sample sizes, the accuracy of the model selection is limited by the large variance of CV error estimates. In this paper, we study the applicability of a recently proposed Bayesian error estimator for the selection of the best model along the regularization path. We also propose an extension of the estimator that allows model selection in multiclass cases and study its efficiency with L1 regularized logistic regression and L2 regularized linear support vector machine. The model selection by the new Bayesian error estimator is experimentally shown to improve the classification accuracy, especially in small sample-size situations, and is able to avoid the excess variability inherent to traditional cross-validation approaches. Moreover, the method has significantly smaller computational complexity than cross-validation.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Comparison of Linear and Threshold Models for Estimation Genetic and Phenotypic Parameters of Success of Conception at First Service and Inseminations to Conception in Holstein Cattles in East Azarbayjan Province

In this research genetic and phenotypic parameters were estimated using linear and threshold models, for reproductive traits, data from 6 large industrial dairy herd of East Azerbaijan province collected by Agriculture Jihad Organization during 10 years (2001-2010). Best linear unbiased predictions of traits breeding values were estimated using Restricted Maximum Likelihood method by WOMBAT sof...

متن کامل

Comparison of Linear and Threshold Models for Estimation Genetic and Phenotypic Parameters of Success of Conception at First Service and Inseminations to Conception in Holstein Cattles in East Azarbayjan Province

In this research genetic and phenotypic parameters were estimated using linear and threshold models, for reproductive traits, data from 6 large industrial dairy herd of East Azerbaijan province collected by Agriculture Jihad Organization during 10 years (2001-2010). Best linear unbiased predictions of traits breeding values were estimated using Restricted Maximum Likelihood method by WOMBAT sof...

متن کامل

Incorporating External Information in Bayesian Classifiers Via Linear Feature Transformations

Naive Bayes classifier is a frequently used method in various natural language processing tasks. Inspired by a modified version of the method called the flexible Bayes classifier, we explore the use of linear feature transformations together with the Bayesian classifiers, because it provides us an elegant way to endow the classifier with an external information that is relevant to the task. Whi...

متن کامل

Optimal classifiers with minimum expected error within a Bayesian framework - Part II: Properties and performance analysis

In part I of this two-part study, we introduced a new optimal Bayesian classification methodology that utilizes the same modeling framework proposed in Bayesian minimum-mean-square error (MMSE) error estimation. Optimal Bayesian classification thus completes a Bayesian theory of classification, where both the classifier error and our estimate of the error may be simultaneously optimized and stu...

متن کامل

Comparison of Estimates Using Record Statistics from Lomax Model: Bayesian and Non Bayesian Approaches

This paper address the problem of Bayesian estimation of the parameters, reliability and hazard function in the context of record statistics values from the two-parameter Lomax distribution. The ML and the Bayes estimates based on records are derived for the two unknown parameters and the survival time parameters, reliability and hazard functions. The Bayes estimates are obtained based on conju...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Pattern Recognition

دوره 48  شماره 

صفحات  -

تاریخ انتشار 2015